Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling
نویسندگان
چکیده
Massively parallel DNA sequencing is capable of sequencing tens of millions of DNA fragments at the same time. However, sequence bias in the initial cycles, which are used to determine the coordinates of individual clusters, causes a loss of fidelity in cluster identification on Illumina Genome Analysers. This can result in a significant reduction in the numbers of clusters that can be analysed. Such low sample diversity is an intrinsic problem of sequencing libraries that are generated by restriction enzyme digestion, such as e4C-seq or reduced-representation libraries. Similarly, this problem can also arise through the combined sequencing of barcoded, multiplexed libraries. We describe a procedure to defer the mapping of cluster coordinates until low-diversity sequences have been passed. This simple procedure can recover substantial amounts of next generation sequencing data that would otherwise be lost.
منابع مشابه
Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform
Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these li...
متن کاملAn adaptive decorrelation method removes Illumina DNA base-calling errors caused by crosstalk between adjacent clusters
Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing err...
متن کاملDeep sequencing analysis of phage libraries using Illumina platform.
This paper presents an analysis of phage-displayed libraries of peptides using Illumina. We describe steps for the preparation of short DNA fragments for deep sequencing and MatLab software for the analysis of the results. Screening of peptide libraries displayed on the surface of bacteriophage (phage display) can be used to discover peptides that bind to any target. The key step in this discov...
متن کاملCorrection: A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato
Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS) of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a ...
متن کاملLarge Scale Identification of SSR Molecular Markers in Ajowan (Trachyspermum ammi) Using RNA Sequencing
The medicinal plant, Trachyspermum ammi is a rich source of active pharmaceutical ingredients with pharmaceutics effects. Microsatellite markers play a key role in the genome and gene expression, especially in secondary metabolite biosynthesis in medicinal plants. For the first time, transcriptome sequencing of this herb medicine was carried out to identify the microsatellite markers of this sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 6 شماره
صفحات -
تاریخ انتشار 2011